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SUMMARY 

yfe- briefly  drscuss  the  similarities  and  differences  between  two  iterative  estima- 
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tors  that  are  suitable  for  the  network  estimation  problem,  namely  a  modification  of 
the  Iterative  Least-Squares  method  (ILS)  due  to  Schmee  and  Hahn  (1979)  and  the 
Maximum-Likelihood  Estimator  (MLE).  Both  methods  reduce  to  the  usual  Least 
Squares  Multiple  Factors  (LSMF)  method  when  the  censored  data  are  deleted  from  the 
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network  observational  data.  For  the  censored  case,  the  standard  deviation  (o)  of  the 
obscuring  noise  must  be  solved  through  iteration  along  with  the  event  magnitudes  and 
the  station  corrections.  An  extra  constraint  on  o  is  necessary  to  determine  which 
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optimal  estimation  scheme  is  of  interest.  The  final  value  of  for  each  iterative 
scheme  can  be  used  as  a  good  approximation  to  the  unbiased  estimate  of  the  standard 
deviation  of  the  perturbing  noise.  By  scaling  this  p  value  by  the  square  root  of  the 
number  of  observations  associated  with  each  unknown  parameter,  the  uncertainty  in 
each  estimated  parameter  can  be  approximated  efficiently.  These  error  estimates  seem 

to  differ  from  the  unbiased  standard  errors  only  by  a  common  multiplying  constant 
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across  all  stations  and  all  event  m^’s. 

The  bootstrap  method  is  reviewed  and  adapted  to  the  case  of  rr  l‘;  variate  estima¬ 
tion  with  doubly  censored  data.  The  Monte  Carlo  resampling  is  carried  out  among  the 
collection  of  residuals  instead  of  the  observational  data.  The  pool  of  residuals  is 
enlarged  to  include  all  censored  residuals  for  random  drawing.  The  bootstrap  result 
confirms  the  aforementioned  scaling  relationship  between  the  individual  error  estimates 
and  the  global  0  of  the  perturbing  noise.  As  a  result,  the  bootstrap/jackknife  technique 
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may  not  justify  the  considerable  computational  effort  required  for  this  specific  applica¬ 
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phases).  There  is  only  a  narrow  magnitude  range  from  about  5.7 
to  6.5  in  which  the  censoring  effects  of  clipping  or  non-detection 
do  not  lead  to  serious  bias.  Compared  to  MLE,  ILS  tends  to  un¬ 
derestimate  somewhat  the  censoring  effects. 

2b  The  iftb^  versus  rh^  for  110  explosions  (328  phases).  Com-  25 

pared  to  MLE,  ILS  tends  to  underestimate  somewhat  the  censor¬ 
ing  effects. 

3a  The  receiver  effects  of  127  WWSSN  stations  indicate  that  ILS  26 

tends  to  underestimate  the  censoring  effects  due  to  clippings  or 
non-detections,  same  as  in  Figures  2a  and  2b. 

3b  Comparison  of  the  receiver  effects  estimated  by  MLE  and  ELS.  27 

3c  Histogram  of  the  receiver  effects  with  bin  width  0. 1  unit.  Circles,  28 

signs,  and  triangles  represents  the  spread  of  the  127  re¬ 
ceiver  terms  estimated  with  LSMF,  MLE,  and  ILS  methods 
respectively. 
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4a  The  uncertainty  estimated  by  ILS  versus  that  by  ILS  and  29 

bootstrap.  The  abscissa  Error[LS  is  predicted  in  an  analytical 
manner  by  dividing  the  dILS  by  the  square  root  of  number  of  as¬ 
sociated  observations.  The  ELS  plus  bootstrap  method  computes 
the  ordinate  with  statistics  from  600  Monte  Carlo  simulations. 

The  highly  linear  relationship  between  these  two  methods  strong¬ 
ly  indicates  that  ELS  alone  would  give  a  fairly  good  estimate  of 
the  uncertainty,  except  for  the  constant  bias-correcting  factor, 

1.450. 

4b  Same  as  Figure  4a  except  that  MLE  is  used  instead  of  ILS.  The  30 

magnifying  factor  required  in  this  case  is  1.218. 

5a  The  log ] 0V#  of  observations  (including  signals,  noises,  and  clips)  31 

versus  the  log10(uncertainty  estimate).  Filled  circles  represent  the 
results  from  600  bootstrap  iterations.  The  nearly  linear  relation¬ 
ship  (X  +  Y  =  log1()(0.381))  indicates  that  the  product  of  the 
V#  of  observations  and  the  individual  error  estimate  will  be  a 
stable  estimator  of  the  true  cr  of  the  obscuring  perturbations. 

The  lower  curve  (X  +  Y  =  log10(0.263))  gives  the  predicted  un¬ 
certainty  with  <*ils.  There  exists  a  constant  offset  between  the 
predicted  curve  and  the  experimental  curve,  which  can  be  used 
to  test  any  analytical  bias-correction  scheme  to  be  developed. 

5b  Same  as  Figure  5a  except  that  MLE  is  used  instead  of  ILS.  The  32 

linear  relationship  is  as  obvious  as  in  the  ELS  case.  The  offset 
between  the  theoretical  curve  and  the  bootstrap’s  result  is  small¬ 
er,  which  suggests  that  &1LS  requires  more  complicated  bias 
corrections  in  addition  to  the  D.O.F.-adjustment.  The  upper  line 
(X  +  Y  =  log10(0.401))  and  the  lower  line  (X  +  Y  = 
log10(0.329))  will  merge  together  in  the  non-censoring  case. 

6a  The  histogram  of  the  ILS  residuals  with  bin  width  0.1  unit.  Cir-  34 

cles,  “+”  signs,  and  triangles  represents  the  spread  of  7547 
“regular  residuals”,  5012  “noisy  residuals”,  and  950  “clipped 
residuals”,  respectively.  The  ILS  residuals  are  slightly  more  con¬ 
centrated  than  that  of  MLE,  because  dILS  is  smaller  than  0MLE. 

The  interpolated  shape  of  the  histogram  empirically  justifies  the 
idealized  assumption  that  residuals  follow  a  normal  distribution. 
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6b  The  interpolated  histogram  of  the  MLE  residuals  (See  Figure  6a  35 

for  caption). 

6c  The  “refined”  histogram  showing  the  effect  of  incorporating  the  36 

censored  information.  Here  we  replaced  each  censored  residual 
by  its  “refined”  form,  which  is  defined  as  the  discrepancy 
between  the  final  conditional  expectation  of  the  censored  obser¬ 
vation  and  the  regressed  mean.  The  interpolated  histograms  illus¬ 
trate  again  that  the  Gaussian  assumption  is  adequate. 
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1.  INTRODUCTION 

The  problem  of  estimating  ^ody-wave  magnitudes  (mb)  using  amplitudes  read  at  a 
number  of  recording  stations  is  frequently  complicated  by  the  fact  that  the  data  may  be 
heavily  censored.  This  arises  either  because  of  clipping,  where  all  amplitudes  can  be  deter¬ 
mined  only  to  exceed  a  given  lower  bound  (i.e.  the  right-censored  case  in  statistical  terms),  or 
because  the  signals  are  weaker  than  the  ambient  noise  level  and  hence  are  not  detected  (i.e. 
the  left-censored  case).  If  one  simply  averages  the  amplitudes  for  those  stations  which  detect 
an  event,  without  regard  for  those  that  clipped  or  did  not  record,  serious  biases  may  result  in 
the  magnitude  estimated. 

The  problem  of  joindy  estimating  event  magnitude  and  station  corrections  (Douglas, 
1966;  von  Seggem,  1973)  using  data  that  are  incomplete  because  of  missing  station  readings 
and  censoring  (above  by  clipping  and  below  by  noise)  has  been  considered  in  Blandford  and 
Shumway  (1982).  In  the  case  where  the  errors  in  observing  the  amplitudes  are  uncorrelated 
from  event  to  event  and  from  station  to  station,  the  Expectation  Maximization  (EM)  algorithm 
(Dempster  et  al. ,  1977)  can  be  used  to  solve  the  multiparametric  version  of  the  maximum 
likelihood  estimation  problems  originally  considered  by  Ringdal  (1976)  for  the  left-censored 
case  and  by  von  Seggern  and  Rivers  (1978)  for  the  doubly  censored  case. 

We  will  examine  the  similarities  and  differences  between  two  iterative  estimators  for 
censored  model,  namely  the  Iterative  Least-Squares  method  (ILS)  due  to  Schmee  and  Hahn 
(1979)  and  the  Maximum-Likelihood  Estimator  (MLE).  These  two  methods  reduce  to  the 
same  Least  Squares  Multiple  Factors  (LSMF)  (Douglas,  1966)  method  when  the  censored  data 
are  deleted  from  the  network  observational  data.  Each  method  is  completely  characterized  by 
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a  constraint  on  the  choice  of  the  standard  deviation  (cr)  of  the  obscuring  noise,  and  a  is 
solved  through  iteration  along  with  the  event  magnitudes  and  the  station  corrections.  The 
final  value  of  a  for  each  iterative  scheme  can  be  used  as  a  good  approximation  to  the 
unbiased  estimate  of  the  standard  deviation  of  the  perturbing  noise.  By  scaling  this  a  value  by 
the  square  root  of  the  number  of  observations  associated  with  each  unknown  parameter,  the 
uncertainty  in  each  estimated  parameter  can  be  approximated  efficiently.  These  error  esti¬ 
mates  seem  to  differ  from  the  unbiased  standard  errors  only  by  a  common  multiplying  con¬ 
stant  across  all  stations  and  all  event  mb’s. 

The  bootstrap  method  (Efron,  1979,  1981)  has  been  used  as  a  means  of  the  uncertainty 
assessment  in  the  mb  estimation  problem  for  several  years  (Blandford  et  al. ,  1983;  McLaugh¬ 
lin,  1986a,  1986b,  1988).  We  will  suggest  a  slighdy  different  way  of  using  the  bootstrap 
technique.  Specifically,  we  propose  to  enlarge  the  pool  of  residuals  to  include  all  doubly  cen¬ 
sored  residuals  for  random  drawing,  and  to  change  the  definition  of  the  “censored  residual”. 
The  bootstrap  method  can  be  used  to  confirm  the  aforementioned  scaling  relationship  between 
the  individual  error  estimates  and  the  global  o  of  the  perturbing  noise. 
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2.  MODEL  ASSUMPTIONS  AND  GENERAL  IDEAS 

Consider  ne  explosions  recorded  at  some  or  all  of  ns  stations.  The  linear  model  for  the 
j’th  station  observing  the  i’th  event  is  that  the  magnitude  Y;j  can  be  represented  as 

Yy  =  Ei  +  Sj  +  ey  (1) 

where  depends  on  the  seismic  size  of  the  explosion,  Sj  is  the  station  correction  term,  and 

the  raw  magnitude  Yy,  which  contains  the  error  term  e^,  is  computed  by  (  log10(A/T)  +  B  ) 
(Veith  and  Clawson,  1972),  as  the  usual  practice.  Since  adding  a  common  constant  to  each  E; 
and  subtracting  the  same  constant  from  each  Sj  would  yield  another  set  of  solutions  to  (1), 
usually  the  rather  arbitrary  assumption  that  £  Sj  =  0  is  imposed  to  resolve  the  indeterminacy. 
The  obscuring  errors  ey  are  assumed  to  be  uncorrelated  and  to  belong  to  the  same  probability 
distribution,  namely  a  common  Gaussian  distribution  with  zero  mean  and  variance  a2.  The 
standard  regression  model  in  (1)  may  be  written  in  the  form 

Y  =  H  0  +  e  ,  (2) 

where  H  is  the  design  (or  observation)  matrix  of  dimension  npa[h  x  nq,  0  and  Y  are  column 

vectors  of  size  nq  =  ne  +  ns  -  1  and  n^  respectively,  where  npa[h  is  the  total  number  of  phy¬ 
sically  available  observations  (or  equations). 

For  the  non-censored  case,  Douglas’  (1966)  LSMF  method  is  identical  to  the  standard 
least-squares  (LS)  and 

0ls  =  (H'H)-1H'Y  .  (3) 

This  least-squares  estimator  has  many  optimality  properties.  For  instance,  it  is  unbiased,  and  it 

gives  minimum  variance  within  the  class  of  linear  unbiased  estimators.  It  also  minimizes  the 
residual  sum  of  squares: 
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RSS  =  (Y-HQ)'(Y— H9)  .  (4) 

Furthermore,  8ls  is  also  the  Maximum-Likelihood  Estimate  (MLE)  under  the  Gaussian 

assumption.  It  is  very  easy  to  compute  the  uncertainty  by  using  Var[§Ls]  =  a2(HTI)  *,  which 
is  simply  scaling  the  variance  of  the  random  perturbations  by  the  number  of  observations 
associated  with  each  unknown.  This  a  of  the  random  perturbations  turns  out  to  be  the  most 
important  factor  in  determining  the  precision  of  the  individual  mb  estimate,  even  in  the  doubly 
censored  case. 

We  will  assume  that  there  are  four  categories  of  observational  data  that  may  be  available 
in  any  given  situation.  For  the  n*  observation  in  category  m,  the  observed  y^  will  be  in  one 
of  the  following  four  categories  (Blandford  and  Shumway,  1982): 

m  =  0 

denotes  an  observation  which  is  available, 
m  =  1 

denotes  an  observation  yin,  known  only  to  be  below  some  threshold  level  tln;  yln  <  tln. 
m  =  2 

denotes  an  observation  y^,  known  only  to  be  above  some  threshold  level  t2n;  y2n  -  t^- 
m  =  3 

denotes  the  case  where  y3n  is  not  observed  as  in  the  case  of  a  station  not  recording  a  par¬ 
ticular  event. 

Throughout  this  study,  we  will  assume  that  there  are  n0,  n,,  n2,  and  n3  paths  associated 
to  these  four  categories,  respectively.  Obviously,  n0+n!-i-n2+n3  =  nc  x  ns:,  and  n0+nt+n2  = 
npalh.  We  will  use  y0,  tj,  and  t2  to  denote  the  collection  of  type  0,  1,  and  2  observations 
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respectively.  It  is  not  difficult  to  derive  an  estimator  for  the  doubly  censored  linear  model 
such  that  the  estimator  would  coincide  with  LSMF  whenever  it  is  applied  to  a  non-censored 
data  set.  For  instance,  simply  taking  the  LSMF  and  ignoring  all  censored  observations  is  an 
obvious  yet  uninteresting  case.  Any  nontrivial  generalization  of  the  LSMF  to  the  censored 
model  should  at  least  possess  some  kind  of  optimality  property,  or  it  should  seem  natural.  It 
seems  very  difficult  to  develop  a  procedure  which  can  retain  all  the  optimality  properties  that 
the  LSMF  has  in  the  non-censored  case,  e.g. ,  unbiasedness,  least-squares,  and  maximum- 
likelihood.  Henceforth,  we  will  develop  a  collection  of  estimators  based  on  some  general  idea, 
and  then  impose  an  extra  constraint  to  make  the  solution  optimal  in  some  sense. 

For  notational  convenience  we  will  define 

ft  =  H0.  (5) 

Let  a  be  the  standard  deviation  of  the  independent  Gaussian  errors  e^  in  equation  (1), 

and  consider 


and 


«u) = ■ 

<Ku)  =  [j!>(x)dx  , 


(6a) 

(6b) 


SM  a  ^  ■  <6c> 

Suppose  that  o  is  known,  and  suppose  that  the  current  (i.e. ,  the  r111  step)  estimate  for  0  is 

0r.  Then  the  conditional  expectations  of  the  censored  observations  based  on  current  estimates 


of  the  system  parameters  are  the  following: 


Er(Yoj  I  Y0j  -  top  ~  toj  - 


(7a) 
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Er(Y !j  I  Yjj  <  tjj)  =  mrj  -  oS(2lj)  , 


(7b) 


where  Er  denotes  expectation  with 


Er(Y2j  I  Y2j  ^  t2j)  -  (J.2j  +  aS(-z2j)  , 


(7c) 


i 


=  1,2;  j  =  I,2...nj  . 


(8) 


In  both  the  ILS  and  MLE  procedures,  the  conditional  expectations  in  formula  (7)  (i.e. , 
the  “best  fill-in”  option  as  discussed  in  Gleit  (1985))  are  then  used  to  obtain  the  updated 
coefficients  0  by  standard  regression,  i.e. , 


0r+1  =  (HUr'H'Y1  (9) 

where  Yr  denotes  the  data  vector  Y  with  conditicnal  expectations  replacing  the  censored 

values  (Schmee  and  Hahn,  1979;  Aitkin,  1981;  Blandford  and  Shumway,  1982). 

Procedures  (7)  through  (9)  are  repeated  until  the  parameter  estimate,  0r,  converges.  In 
the  non-censored  case,  neither  the  iteration  nor  the  a  priori  information  about  a  is  required, 
since  a  is  uniquely  determined  as  the  RMS  residual  after  a  single  regression  on  the  original 
type  0  data.  However,  this  is  no  longer  the  case  for  censored  models.  For  an  arbitrary  a,  a  0 
can  always  be  found  with  the  iteration  procedure  discussed  above,  whenever  the  non-detection 
and  the  clips  are  present.  Therefore  it  is  necessary  to  impose  some  extra  constraint  to  make 
the  solution  unique  and  optimal.  Two  typical  constraints  on  a  will  be  discussed. 

0  in  Equation  (9)  can  be  solved  through  direct  inversion,  or  equivalently,  it  can  be  com¬ 
puted  by  stacking  the  equations  due  to  the  specific  form  of  the  observation  model  (1)  (Bland¬ 
ford  and  Shumway,  1982).  In  fact,  stacking  the  equations  (i.e. ,  network  averaging)  would  be 
much  more  efficient  and  more  accurate  than  the  cumbersome  matrix  algebra.  To  adopt  this 
approach,  the  purely  predicted  values  for  type  3  paths  defined  as 
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E(Y3j  1  Y3j  is  missing  )  =  (10i 

must  also  be  included  in  the  stacking. 

Note  that  in  each  iteration  loop,  the  regression  procedure  (9)  is  carried  out  with  the 
“refined”  data  set  Y*  (Equation  (7)),  in  which  we  have  explicitly  assigned  the  equal  weight  to 
each  transformed  path  ((7b)  and  (7c))  as  well  as  each  regular  signal  (7a).  In  fact,  the  observa¬ 
tion  matrix  H  in  equation  (2)  treats  censored  paths  just  the  same  as  regular  paths.  Therefore 
it  seems  reasonable  to  define  the  precision  in  each  parameter  estimate  by  dividing  o  by  the 
square  root  of  the  total  number  of  paths  associated  with  that  specific  parameter.  This  is 
exactly  the  same  logic  used  in  the  case  of  the  non-censored  model.  We  will  apply  this  idea  to 
both  the  ILS  and  the  MLE,  and  check  the  validity  of  this  approach  by  the  bootstrap  method  in 
a  later  section. 


Teledyne  Geotech 


7 


June  1988 


Iterative  Network  m.  Estimators  With  Censored  Data  TGAL-88-06 

O 

3.  ITERATIVE  LEAST  SQUARE  METHOD 

In  the  non-censored  case,  LSMF  minimizes  the  residual  sum  of  squares  (equation  4). 
For  the  censored  case,  we  detine  the  “squared  penalty  loss”  as 

A  (a)  s  (Y— H0)'(Y-H0) ,  (1 1) 

where  0  =  0(a)  is  the  parameter  estimate  determined  by  a  fixed  a  using  the  iterative  pro¬ 
cedures  described  in  the  last  section,  and  Y  the  final  data  vector  with  censored  values  replaced 
by  the  conditional  expectations  (cf. ,  Equation  7).  In  other  words,  A(a)  is  the  sum  of  all 
squared  discrepancies  between  the  regressed  parameters  (i.e. ,  the  mean)  and  the  “best  guess” 
of  the  observational  data  based  on  the  censoring  information.  Obviously  A(a)  can  be  con¬ 
sidered  as  a  measure  of  the  goodness  of  fit.  In  the  non-censored  case  (4),  A  =  RSS,  and  there 
exists  a  unique  6  such  that  A  =  n0  &  .  Furthermore,  substituting  any  other  value  of  a  into  (7) 
through  (9)  will  not  affect  the  result.  In  fact,  this  unique  d  is  simply  the  root-mean-squared 
(RMS)  residuals.  Along  the  same  line  of  thought,  there  exists  a  unique  6  such  that 

A(d)  =  npath62  ,  (12) 

where  npat}j  =  (n0+n1+n2).  Thus  this  d  can  be  visualized  as  the  generalized  “root-mean- 

squared-residuals”  for  the  censored  model  in  a  very  loose  sense.  The  corresponding  estimator 
is  tentatively  called  the  Iterative  Least  Square  (ILS).  Since  this  a  is  not  known,  it  must  be 
solved  together  with  the  0.  This  ILS  scheme  is  very  similar  to  that  in  Schmee  and  Hahn 
(1979)  and  Aitkin  (1981)  except  that  they  use  npalh-nq  in  (12).  The  reason  for  this 
modification  will  become  clear  in  a  later  section. 

The  ILS  condition  A  =  (n0+nj+n2)  <j2  implies  the  following  simpler  form,  which  is 
appropriate  for  solving  a  iteratively: 
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£(yoj  -  Moj)2 

a2  = - - - - .  (13) 

n0  +  n,  +  n2  -  Xs(zlj)2  “  Zs(_z2j)2 

j=l  j=i 

It  is  solved  iteratively  as  follows:  starting  with  the  initial  guess  of  0  and  a2  ( e.g . ,  by  treating 
the  censored  data  as  uncensored,  or  using  non-censored  data  alone),  the  conditional  expecta¬ 
tions  are  calculated  (Equation  (7))  and  a  new  0  is  obtained.  Then  o2  is  updated  with  (13)  and 
the  procedure  is  repeated  until  o2  and  0  converge.  Note  that  SCz^)  and  S(-z2j)  on  the  right- 
hand  side  of  (13)  are  computed  with  the  current  estimate  of  o,  i.e.  ar  ,  while  o  on  the  left- 
hand  side  is  the  updated  one,  i.e.  or+1  . 

A  correction  for  the  degrees  of  freedom  (D.O.F.)  of  the  whole  system  is  suggested  for 
application  to  the  limit  of  or+1  in  (13): 


&ls  =  — -np— ■  limfa2!)  .  (14) 

npath  nq 

For  the  non-censored  models,  this  D.O.F.  adjustment  will  convert  the  conventional  RMS 
residual  to  the  the  unbiased  a  estimate  of  the  obscuring  noise.  For  censored  data,  the  adjust¬ 
ment  in  (14)  is  conjectured  to  be  able  to  reduce  substantially  the  bias  in  a,  although  a  (closed 
form)  full  correction  for  the  bias  is  not  available  yet  (Schmee  and  Hahn,  1979;  Aitkin,  1981). 
We  will  show  that  dILS  differs  from  the  bootstrap  a  only  by  a  multiplicative  constant. 
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4.  MAXIMUM-LIKELIHOOD  METHOD 


Suppose  we  seek  the  Maximum- Likelihood  Estimates  (MLE)  instead  of  the  least- squares 
solution,  i.e. ,  we  look  for  the  parameter  estimates  which  maximize  the  following  log- 
likelihood  function  of  the  “incomplete  data  sampler”  y0,  tj,  t2  : 


In  L(y0,  tl5 12  I  8,  o2)  =  -  —  ln(2;ta2)  - 


2a2jF,<yoi  Moj> 


2 


+  s  In  <&(zij)  +  Yj  ln  ®(-Z2j) 
j=l  j=l 


where  z,r  d>,  <J),  and  S  are  defined  earlier. 


(15) 


A  number  of  procedures  may  be  used  to  maximize  the  log-likelihood  function,  such  as 
Newton-Raphson,  or  scoring.  If  the  dimension  of  9  is  large,  then  these  methods  become  com¬ 
putationally  intractable. 

Differentiating  (15)  with  respect  to  a  and  setting  the  derivative  equal  to  zero,  one  sees 
that  a2  must  satisfy  the  following  condition  (Aitkin,  1981): 

no 

£(yoj  -  ^oj)2 

a2  = - - .  (16) 

ni  n2 

n0  +  £S(zlj)zlj  -  £S(~Z2j)z2j 
j=l  j=l 

Similar  to  (13)  for  the  ILS  method,  (16)  is  solved  iteratively  with  the  Expectation  Max¬ 
imization  (EM)  algorithm  (Dempster  etal.,  1977):  the  “incomplete  data”  (i.e. ,  the  censored 
observations)  in  the  sufficient  statistics  are  replaced  at  each  iteration  by  their  conditional 
expectations  (7),  given  the  observed  data  and  the  current  parameter  estimates,  0r  and  ar,  (the 
“E”  step).  New  parameter  estimates  0r+1  are  then  obtained  from  the  replaced  sufficient  statis¬ 
tics  as  if  they  had  come  from  a  complete  sample,  and  then  substituted  into  (16)  to  solve  for 
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the  updated  <rr+1  (the  “M”  step).  The  EM  algorithm  has  been  applied  to  many  different 
research  fields  recently  (e.g.  Shumway,  1982;  Shumway  and  Azari,  1988).  Blandford  and 
Shumway  (1982)  used  an  equivalent  yet  more  complicated  formulation  of  (16). 

Once  the  iteration  loop  converges,  i.e.  or+1  =  ar,  the  same  D.O.F.-adjustment  (14)  is 
suggested  as  in  the  ELS  case  for  consistency. 

Note  that  in  the  non-censoring  case,  tq  =  n2  =  0,  both  equation  (16)  and  equation  (13) 
will  become  the  conventional  RMS  residual: 

no 

2  £(yoi^>2  07, 

<r  =  - - . 

no 

There  is  an  easy  way  to  distinguish  the  ILS  and  the  MLE.  Recall  that  “squared  penalty 
loss”  A  was  defined  as  (Y— H0)'(Y— H0),  where  Y  =  E(Yly0,t1,t2).  The  ELS  chooses  A/n^  as 
a2,  while  the  MLE  chooses  E[(Y-H0)'(Y-H0)l  yo,ti,t2]/npath-  Since  the  computation  of  con¬ 
ditional  expectation  is  not  commutative,  the  ELS  and  the  MLE  will  end  up  with  different  esti¬ 
mates  of  a,  and  hence  all  the  event  magnitudes  and  station  effects  as  well  as  their  associated 
uncertainty  estimates  will  be  different  in  general. 

EXAMPLE  l 

To  illustrate  how  the  EM  and  the  ELS  methods  can  be  used  directly  to  estimate  the 
uncertainty  associated  with  their  corresponding  optimum  parameter  estimates,  we  have 
selected  the  same  four  events  as  in  Blandford  and  Shumway  (1982).  The  epicenters  of  these 
events  are  listed  in  Table  1.  Table  2  gives  the  raw  magnitude  data  recorded  at  71  WWSSN 
stations  that  we  used.  It  also  lists  the  estimated  station  terms,  Sj,  using  the  LSMF,  the  ILS, 
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and  the  MLE  methods.  The  errors  in  0ILS  and  0mle  316  computed  with  the  same  way  as  for 
LSMF,  namely  Var[0]  = 


Table  1.  Epicenters  &  Dates 

Date 

Name 

Latitude 

Longitude 

631026 

Shoal 

39.200n 

118.380w 

660602 

Piledriver 

37.230n 

116.060w 

631020 

Rubis 

24.000n 

5.000e 

650227 

Saphir 

24.060n 

5.030e 
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Table  2.  Raw  mh  at  71  WWSSN  Sites  and  The  Station  Terms 


Shoal 

Piledriver 

Rubis 

1BI 

LSMF 

ELS 

MLE 

AAE 

— 

— 

5.29 

5.81 

-0.08+0.18 

-0.08+0.19 

-0.08+0.20 

AAM 

5.09 

5.65 

5.29 

— 

-0.02±0.15 

0.09+0.16  , 

0.10+0.17 

AKU 

— 

5.27 

— 

6.08 

0.04±0.18 

0.06+0.19 

0.0610.20 

ALQ 

— 

— 

5.85 

5.96 

0.28+0.18 

0.27+0.19 

0.27+0.20 

ANT 

ebb 

5.86 

5.44 

5.76 

0.09+0.15 

0.03+0.14 

0.03+0.14 

AQU 

— 

>5.10 

5.89 

0.16±0.25 

0.0410.16 

0.04+0.17 

ARE 

5.05 

5.92 

5.45 

5.74 

0.08±0.13 

0.16+0.14 

0.16+0.14 

ATL 

5.12 

5.61 

6.08 

0.2410.15 

0.3510.16 

0.3610.17 

ATU 

<6.06 

— 

5.51 

>5.58 

-0.0110.25 

0.06+0.16 

0.06+0.17 

BHP 

<4.64 

<5.23 

6.01 

6.01 

0.3910.18 

0.0410.14 

0.0410.14 

BLA 

5.08 

5.50 

— 

5.92 

0.0610.15 

0.16+0.16 

0.17+0.17 

BOG 

<4.99 

5.27 

<5.14 

-0.2610.25 

-0.3610.16 

-0.37+0.17 

BOZ 

— 

— 

5.81 

6.06 

0.3110.18 

0.30+0.19 

0.30+0.20 

BUL 

— 

— 

5.45 

5.58 

-0.1110.18 

-0.12+0.19 

-0.12+0.20 

CAR 

5.55 

5.71 

5.08 

5.32 

-0.0410.13 

0.04+0.14 

0.04+0.14 

CHG 

— 

— 

— 

5.67 

-0.0610.25 

-0.10+0.27 

-0.10+0.29 

CMC 

— 

5.00 

— 

SE£3i 

-0.3610.18 

-0.34+0.19 

-0.34+0.20 

COL 

— 

5.73 

— 

MMIM 

-0.4410.18 

-0.42+0.19 

-0.42+0.20 

COP 

<5.13 

<5.61 

5.44 

<5.56 

-0.0810.25 

-0.23+0.14 

-0.24+0.14 

DAL 

— 

— 

5.55 

5.79 

0.0510. 1 8 

0.0410.19 

0.04+0.20 

DUG 

— 

— 

5.47 

5.73 

-0.0310.18 

-0.03+0.19 

-0.03+0.20 

ESK 

— 

<5.28 

— 

0.1410.25 

-0.11+0.19 

-0.12+0.20 

FLO 

4.98 

— 

— 

| 

0.0910.18 

0.21+0.19 

0.21+0.20 

GDH 

<4.99 

<5.29 

6.01 

6.14 

0.4510.18 

0.16+0.14 

0.16+0.14 

GEO 

<5.44 

5.40 

6.08 

5.69 

0.1310.15 

0A5±0A4 

0.15+0.1 4 

GIE 

— 

5.57 

— 

— 

0.0410.25 

0.11+0.27 

0.1110.29 

GOL 

— 

5.05 

egg 

-0.5310.18 

-0.54+0.19 

-0.54+0.20 

JER 

— 

— 

— 

El 

0.5210.25 

0.4810.27 

0.48+0.29 

KEV 

<5.03 

5.34 

5.68 

>6.26 

-0.0110.18 

0.17+0.14 

0.17+0.14 

KIP 

5.47 

6.10 

— 

0.4910.18 

0.66+0.19 

0.67+0.20 

KOD 

— 

— 

— 

5.39 

-0.3410.25 

-0.38+0.27 

-0.38+0.29 

KON 

<4.92 

5.78 

5.38 

0.1210.15 

0.09+0.14 

0.09+0.14 

KTG 

4.95 

5.90 

<4.96 

mm 

-0.0810.15 

-0.14+0.14 

-0.1410.14 

LON 

— 

— 

<5.10 

5.61 

-0.12+0.25 

-0.35+0.19 

-0.36+0.20 

LOR 

— 

5.78 

— 

0.25+0.25 

0.3210.27 

0.32+0.29 
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Table  2.  (Continued) 

Shoal 

Piledriver 

Rubis 

Saphir 

LSMF 

ILS 

MLE 

LPB 

5.06 

— 

5.22 

5.47 

-0.19±0.15 

-0.1010.16 

-0.1010.17 

LPS 

— 

5.29 

5.22 

— 

-0.27±0.18 

-0.2310.19 

-0.2310.20 

LUB 

— 

— 

6.44 

0.71±0.25 

0.6710.27 

0.6710.29 

MAL 

<4.91 

5.35 

— 

— 

-0.1810.25 

-0.1510.19 

-0.1510.20 

MDS 

5.07 

— 

5.37 

5.55 

-0.1110.15 

-0.0210.16 

-0.0210.17 

MNN 

4.42 

— 

5.40 

5.73 

-0.2510.15 

-0.1710.16 

-0.1710.17 

NAI 

---- 

5.43 

>5.86 

-0.0910.25 

0.0910.19 

0.1010.20 

NAT 

— 

5.84 

— 

— 

0.3110.25 

0.3810.27 

0.3810.29 

NDI 

— 

— 

5.29 

5.64 

-0.1610.18 

-0.1710.19 

-0.1710.20 

NNA 

<4.45 

5.64 

5.42 

5.86 

0.0510.15 

-0.0610.14 

-0.0610.14 

NOR 

— 

5.23 

-0.4910.18 

-0.4710.19 

-0.4710.20 

NUR 

<4.72 

5.36 

5.12 

5.42 

-0.2910.15 

-0.2910.14 

-0.3010.14 

OGD 

5.08 

5.10 

— 

5.66 

-0.1610.15 

-0.0610.16 

-0.0610.17 

OXF 

_ 

5.98 

— 

— 

0.4510.25 

0.5210.27 

0.5210.29 

PEL 

— 

5.47 

— 

6.07 

0.1410.18 

0.1610.19 

0.1610.20 

POO 

— 

— 

— 

5.64 

-0.0910.25 

-0.1310.27 

-0.1310.29 

PRE 

— 

— 

5.30 

5.44 

-0.2610.18 

-0.2610.19 

-0.2610.20 

PTO 

— 

5.31 

5.32 

5.69 

-0.1510.15 

-0.1410.1 6 

-0.1410.17 

QUE 

— 

— 

4.70 

4.93 

-0.8110.18 

-0.8210.19 

-0.8210.20 

E 

— 

— 

6.72 

6.45 

0.9610.18 

0.9510.19 

0.9510.20 

WSM 

4.81 

4.97 

5.52 

5.76 

-0.1910.13 

-0.1110.14 

-0.1110.14 

EH 

— 

— 

— 

5.78 

0.0510.25 

0.0110.27 

0.0110.29 

Ell® 

4.83 

5.52 

— 

— 

-0.1210.18 

0.0510.19 

0.0610.20 

SHA 

<5.15 

5.64 

5.86 

6.16 

0.2910.15 

0.2710.14 

0.2710.14 

SHI 

— 

— 

5.81 

>5.36 

0.2910.25 

0.3110.19 

0.3110.20 

SHK 

— 

5.49 

— 

— 

-0.0410.25 

0.0310.27 

0.0310.29 

SHL 

— 

— 

5.33 

5.45 

-0.2410.18 

-0.2410.19 

-0.2410.20 

SJG 

— 

5.67 

— 

5.83 

0.1210.18 

0.1410.19 

0.1410.20 

STU 

<4.86 

5.46 

5.90 

6.19 

0.2610.15 

0.1910.14 

0.1910.14 

TOL 

— 

5.55 

— 

— 

0.0210.25 

0.0910.27 

0.0910.29 

TRI 

<4.91 

5.37 

5.49 

5.69 

-0.0810.15 

-0.0810.14 

-0.0810.14 

TRN 

<4.72 

5.48 

5.23 

5.69 

-0.1310.15 

-0.1510.14 

-0.1510.14 

UME 

<4.50 

5.76 

5.27 

5.60 

-0.0510.15 

-0.1310.14 

-0.1310.14 

VAL 

<5.09 

<5.09 

5.55 

5.61 

-0.0510.18 

-0.2010.14 

-0.2010.14 

WES 

<4.52 

4.95 

5.45 

5.63 

-0.2510.15 

-0.2810.14 

-0.2910.14 

WIN 

— 

— 

5.01 

5.50 

-0.3710.18 

-0.3810.19 

-0.3810.20 
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In  the  first  experiment,  each  of  the  LSMF,  MLE,  and  ILS  was  separately  used  with  non- 
censored  data  ( i.e . ,  signals)  alone.  They  all  give  the  same  results  as  shown  in  the  LSMF  row 
in  Table  3.  The  147  good  paths  yield  an  RMS  residual  of  0.179,  and  d  =  0.254  regardless  of 
the  method  used.  Note  that  the  third  digit  after  the  decimal  point  is  actually  computer- 
dependent. 


Table  3.  Comparison  of  LSMF,  MLE,  ILS 

Name 

Shoal 

Piledriver 

Rubis 

Saphir 

signals 

14 

38 

42 

53 

noise 

20 

5 

3 

1 

clips 

0 

0 

1 

4 

LSMF 

5.054±0.068 

5.52810.041 

5.51610.039 

5.73510.035 

MLE 

4.777±0.049 

5.46110.044 

5.50210.042 

5.76710.038 

ILS 

4.784±0.046 

5.46110.041 

5.50210.040 

5.76710.035 

When  the  noise  and  clips  are  added  for  separate  MLE  and  ILS  runs,  the  bias  reduction 
then  becomes  quite  obvious  for  Shoal.  Since  there  were  only  a  few  censored  paths  associated 
with  the  remaining  three  events,  their  m^^  estimates  are  already  quite  acceptable.  The  mb’s 
of  three  out  of  four  events  decrease  once  the  censored  data  (mainly  non-detections)  are 
included.  The  mb  of  Saphire  slightly  increases,  because  there  are  more  clips  than  non¬ 
detections. 

If  we  consider  lim(ar+1)  the  “generalized  RMS  residual”,  and  assume  that  the  D.O.F.- 
adjustment  is  sufficient  to  remove  the  bias  in  the  variance  estimate,  then  the  ILS  appears  to  be 
able  to  generate  a  smaller  standard  deviation  than  does  the  MLE.  It  L  henceforth  not  surpris¬ 
ing  that  the  error  estimates  in  mhn  s  are  systematically  smaller  than  those  in  rhbMIJF  (Tables  2 
and  3).  However,  since  neither  d(I  <-  nor  p  is  proven  unbiased,  the  error  estimates  in 
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Tables  2  and  3  are  actually  multiplied  by  an  unknown  constant  across  the  four  events  as  well 
as  all  71  stations.  This  constant  factor  for  the  ILS  method  will  be  different  from  that  for  the 
MLE,  as  illustrated  in  the  bootstrap  experiments  in  a  later  section.  In  this  example,  the  possi¬ 
bly  biased  estimates  of  the  standard  deviations  of  the  perturbing  noise  derived  by  different 
methods  are  dILS  =  0.270  (Equations  (13)  and  (14))  and  dMLE  =  0.286  (Equations  (16)  and 
(14)),  respectively.  The  log-likelihood  function  (15)  evaluated  at  0MLE  is  -5.622  which  is 
larger  than  that  of  -6.103  evaluated  at  0^,  as  expected. 
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5.  BOOTSTRAP  METHOD 

Efron’s  bootstrap  method  (1979,  1981;  also  Efron  and  Tibshirani,  1985)  has  been  pro¬ 
posed  by  Blandford  et  al.  (1983)  to  estimate  the  uncertainty  in  mb  estimates  and  station 
terms.  Some  recent  work  on  the  magnitude  estimation  problem  has  extensively  utilized  this 
technique  (McLaughlin  et  al. ,  1986a,  1986b;  McLaughlin,  1988). 

Suppose  the  optimal  estimates  of  the  components  of  0  {i.e. ,  the  collection  of  till  Es  and 
Sj)  are  available  by  using  some  estimation  algorithm.  To  apply  the  bootstrap  method,  one  has 
to  construct  three  pools  of  residuals.  For  a  type  0  (uncensored)  path,  we  define  its  “regular 
residual”  as 

ey  =  Yy  -  E;  -  Sj  ,  (19) 

where  (i,  j)  runs  through  all  paths  of  category  0. 

For  each  noisy  path  {i.e.  Ej  +  Sj  +  ey  <  Yy  ),  the  corresponding  “censored  residual”  is 
defined  as  ey  <  ty  =  Yy  -  Ej  -  Sj.  These  are  almost  the  same  as  the  regular  residuals  except 
that  the  equal  sign  was  used  in  equation  (19).  Censored  residuals  from  clipped  paths  are  gen¬ 
erated  in  a  similar  way,  and  they  are  stored  in  the  same  bowl  together  with  the  “regular  resi¬ 
duals”  and  the  “noisy  residuals”.  For  instance,  after  the  Veith-Clawson  distance  normaliza¬ 
tion,  the  WWSSN  station  AQU  (42.354°N,  13.403°E,  in  Aquila,  central  Italy)  reported  the 
raw  magnitudes  of  Shoal,  Rubis,  and  Saphire  as  <4.97,  >5.10,  and  5.89,  respectively  (Table 
2).  By  subtracting  mbMij.  (Table  3)  and  S^leIAQU]  =  0.04  (Table  2)  from  these  readings,  we 
get  three  MLE  residuals:  <0.15,  >-0.44,  and  0.08.  Note  that  AQU  was  able  to  report  a  larger 
reading  such  as  Saphire  with  mb  5.89,  and  it  was  clipped  at  a  lower  clipping  threshold  (5.10) 
for  Rubis,  even  though  Rubis  and  Saphire  were  both  detonated  at  the  same  French  test  site  in 
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Algeria.  The  same  observation  can  be  made  with  WWSSN  station  LON  (46.750°N, 
121.810°W,  in  Longmire,  Washington,  U.S.)  (Table  2).  As  more  events  are  used  in  the  regres¬ 
sion,  this  phenomenon  becomes  more  severe  in  that  fewer  stations  can  maintain  well-defined 
detection  and  clipping  thresholds  in  the  “magnitude”  domain.  The  easiest  way  to  account  for 
such  unpredictable  features  across  a  large  seismic  network  is  to  model  them  with  the  inter¬ 
changeable  random  residuals  as  presented  here. 

Now  for  each  path  (i,  j),  a  pseudo-magnitude  is  constructed  by  perturbing  its  predicted 
value,  Ej  +  Sj,  with  a  randomly  drawn  residual  e^  : 

Y;j*  =  Ej  +  Sj  +  eiT  (20) 

where  (i',  j')  denotes  the  identifier  of  the  randomly  drawn  event-station  pair.  If  this  residual 
was  censored,  then  the  equality  sign  in  equation  (20)  should  be  replaced  by  <  or  >,  according 
to  the  flag  of  e(y.  Thus  we  are  randomly  allocating  the  residuals  (with  replacement)  to 
obscure  the  predicted  magnitude  value.  A  good  path  ( i.e .  of  type  0)  might  become  clipped  or 
noisy  during  certain  resampling  procedures,  but  missing  paths  (i.e.  type  3  “observations”) 
remain  invariant 

A  ^  A 

The  pseudo-magnitude  samples  are  used  to  reconstruct  new  estimates  E;  and  Sj  using  a 
selected  estimator.  This  Monte  Carlo  procedure  is  repeated  many  times,  and  the  sample  stan- 
dard  deviations  of  the  E;  ’s  and  Sj  ’s  are  proposed  by  Efron  (1981)  as  good  approximations  to 
the  standard  errors  of  the  original  estimators  Ej  and  Sj.  The  D.O.F.-adjustment  (14)  is  applied 
to  the  bootstrap  standard  errors  for  consistency. 

Using  the  combination  of  the  MLE  and  the  bootstrap  (with  200,  400,  and  600  bootstrap 
resamplings),  the  standard  error  estimates  of  the  four  events  in  Example  1  are  listed  in  Table 
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4  below: 


Table  4.  MLE  +  Bootstra 

0 

Event 

std.  er.  200 

std.  cr.  4Qo 

std.  er.  600 

&MLE+B 

Shoal 

0.059 

0.059 

0.058 

0.005 

0.336 

Piledriver 

0.049 

0.052 

0.053 

0.009 

0.345 

Rubis 

0.043 

0.045 

0.045 

0.007 

0.308 

Saphir 

0.042 

0.046 

0.056 

0.007 

0.332 

In  the  table,  std.  er.  -v  i=200,400,600  represents  the  uncertainty  estimates  made  with  200, 
400,  and  600  bootstrap  resamplings,  respectively.  Comparing  Tables  3  and  4,  it  is  obvious  that 
the  errors  in  each  event  as  estimated  by  the  MLE  with  the  bootstrap  method  are  very  close  to 
what  we  obtained  with  one  single  call  of  the  MLE  alone.  The  discrepancy  between  the  aver¬ 
aged  m^  B  after  600  resamplings  and  the  m,.  is  negligible  (column  5  of  Table  4).  If  we 
extract  the  bootstrap  standard  error  in  each  station  and  event  (the  “std.  er.  in  Table  4) 
and  multiply  it  by  the  square-root  of  the  total  number  of  observations  associated  (signals, 
noise,  and  clips),  the  result  (column  6  of  Table  4)  appears  to  be  a  pretty  good  approximation 
of  6mle.  The  root-mean-squared  value  of  such  <3mle+b  computed  from  71  stations  and  4 
events  is  0.296,  whereas  the  $MLE  was  0.286. 

We  also  performed  the  same  experiment  based  on  a  combination  of  ILS  and  the 
bootstrap,  as  shown  in  Table  5  below: 
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Table  5.  ILS  +  Bootstrap  ] 

Event 

std.  er.  200 

std.  er.  400 

std.  er.  ^ 

EH 

Shoal 

0.059 

0.059 

0.057 

0.005 

0.332 

Piledriver 

0.049 

0.052 

0.052 

0.008 

0.342 

Rubis 

0.043 

0.044 

0.044 

0.007 

0.298 

mum 

0.041 

0.045 

0.043 

0.006 

0.328 

In  this  case,  the  root-mean-squared  Ob^+b  (averaged  over  71  stations  and  4  events)  is 
0.294,  whereas  the  d[LS  was  0.278. 

Efron  (1981)  and  McLaughlin  (1988)  perform  the  resampling  among  the  observed  data 
themselves,  because  there  was  only  one  unknown  parameter  and  one  distribution.  In  the  situa¬ 
tion  of  multivariate  estimation,  a  centering  process  (to  remove  the  mean)  is  necessary,  and  the 
resampling  is  carried  out  on  the  residuals  (Blandford  et  al. ,  1983). 

Note  that  in  Blandford  et  al.  (1983),  the  resampling  of  the  residuals  is  limited  to  the 
paths  of  type  0  only,  and  all  clipped  or  noisy  observations  are  held  fixed  at  their  threshold 
levels  during  all  bootstrap  iterations.  The  same  scheme  is  also  applied  by  McLaughlin  (1986a, 
1986b).  This  is  based  on  the  assumption  that  the  same  stations  might  tend  to  be  noisy  or 
clipped  from  event  to  event  (Blandford  et  al. ,  1983),  which  could  be  true  if  all  events  are 
from  the  same  test  site.  For  events  from  several  test  sites,  we  have  noticed  that  keeping 
clipped  and  noisy  paths  unchanged  will  cause  a  larger  discrepancy  between  the  original 

>  A 

optimal  estimators  { Ej,  Sj)  and  the  averaged  bootstrap  estimators,  and  hence  it  will  yield 
questionable  uncertainty  estimates.  This  observation  led  us  to  an  alternative  approach  to  treat 
all  residuals  equally,  whether  they  are  censored  or  not,  in  the  resampling. 
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EXAMPLE  2 

During  the  past  several  years.  Geotech’s  WWSSN  database  has  been  gradually  expanded 
to  110  events  (totaling  328  usable  “a”,  “b”,  and  “max”  event  phases)  (McLaughlin,  1986a; 
Chan  et  al. ,  1988).  We  have  separately  applied  the  ILS,  the  MLE,  the  ILS  with  the  bootstrap, 
and  the  MLE  with  the  bootstrap  on  the  complete  data  set. 

The  328  phases  were  treated  as  uncorrelated  events  in  the  regressions.  The  7547  good 
paths  ( i.e . ,  signals  only)  yield  an  RMS  residual  of  0.258  and  dp^Mp  of  0.266.  When  the  5012 
noisy  and  950  clipped  paths  are  added  to  the  input  data  set,  is  0.263,  and  <5Mle  *s  0-329, 
roughly  in  accord  with  other  work  (e.g. ,  Bache,  1982;  Veith  and  Clawson,  1972).  The  log- 
likelihood  values  computed  at  0ILS  and  0MLE  aic  -4760  and  -4325,  respectively. 

Figure  1  plots  the  log-likelihood  and  the  squared  penalty  loss  as  a  function  of  o  in  the 
range  from  0.10  to  1.00,  for  the  censored  data  in  this  example.  As  already  mentioned  earlier, 
each  G  determines  a  unique  solution  of  0  through  the  iterations  ((7)-(9)).  A  question  of  top 
importance  is  how  to  constrain  the  G  for  optimality.  From  the  log-likelihood  function 
evaluated  at  every  possible  a  (filled  circles  in  Figure  1),  it  is  clear  that  the  log-likelihood 
function  attains  its  maximum  at  RMS  a  =  0.324,  or  dMLE  =  0  329  after  the  D.O.F.- 
adjustment.  On  the  other  hand,  the  squared  penalty  loss  function  A(g)  (shown  as  “+”sign) 
has  no  apparent  critical  point  in  the  range  of  interest.  The  ILS  method  picks  the  G  with  A  = 
(n0+ni+n2)  a2,  which  is  valid  only  at  RMS  G  =  0.258,  or  d[LS  =  0.263  after  the  D.O.F.  adjust¬ 
ment.  Since  the  ILS  method  assumes  that  each  censored  path  contributes  an  equal  amount  (or 
weight)  of  information  as  does  any  non-censored  path,  conceptually  this  is  already  the 
extremal  usage  of  the  censored  data,  and  hence  no  smaller  o  is  acceptable  from  a  realistic 
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Figure  1.  The  log-likelihood  and  the  squared  penalty  loss  as  a  function  of  a  in  the  range 
from  0.10  to  1.00.  The  log-likelihood  function  attains  its  maximum  at  RMS  a  =  0.324,  or 
dMLE  =  0.329  after  the  D.O.F.  adjustment  The  squared  penalty  loss  function  A(o)  increases 
almost  linearly  as  o  increases,  and  it  has  no  apparent  critical  point  in  the  range  considered. 
Conceptually,  censored  paths  cannot  provide  more  information  than  the  non-censored  paths, 
and  therefore  the  A  =  (n0+nj+n2)  o2  determines  the  smallest  acceptable  a,  which  yields  RMS 
a  =  0.258,  or  =  0.263  after  the  D.O.F.-adjustment.  Thus  ILS  is  indeed  optimal  in  terms 
of  the  squared  penalty  loss. 
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point  of  view.  Thus  A(3jls)  is  l^e  smallest  achievable  penalty  loss  if  we  limit  the  a  to  be  no 
less  than  A/(n0+ni+n2).  In  this  sense,  ELS  is  also  optimal. 

Figure  2a  plots  the  mj^.  and  mbtLS  versus  m^^  f°r  ^0  events.  In  using  WWSSN 
data,  there  is  only  a  narrow  magnitude  range  from  about  5.7  to  6.4  in  which  the  censoring 
effects  of  clipping  or  non-detection  do  not  lead  to  serious  bias  (Chan  et  al. ,  1988),  in  accord 
with  many  network  studies  (Ringdal,  1986;  Clark,  1983;  Liiwall,  1986;  Lilwall  et  al. ,  1988; 
Christoffersson  and  Ringdal,  1981,  Chinnery,  1978;  Everndon  and  Kohler,  1976).  The 
discrepancies  between  mbiLs  and  m^^  are  always  slightly  smaller  than  those  between  m^^ 
and  m^  .  Basically  this  is  because  &jls  is  smaller  than  dMLE,  and  hence  the  ELS  tends  to 
underestimate  somewhat  the  bias  compensation  in  the  computation  of  conditional  expectations, 
as  compared  to  the  MLE  (Figure  2b).  The  same  observation  can  be  made  with  the  receiver 
effects  of  127  stations  (Figures  3a  and  3b).  Figure  3c  plots  the  histogram  of  receiver  effects 
estimated  with  the  UsMF,  the  MLE,  and  the  ELS  method  using  bin  width  0.1  unit. 

The  uncertainty  estimates  of  127  stations  and  328  phases  computed  with  the  four 
aforementioned  techniques  are  plotted  in  figures  4a  and  4b.  The  results  of  600  iterations  of 
the  ELS  (or  the  MLE)  and  the  bootstrap  show  a  very  clear  linear  relationship  between 
log(Vnumber  of  observations)  and  log(uncertainty  estimate)  (Figures  5a  and  5b).  This  means 
that  the  uncertainty  in  each  estimated  parameter  is  completely  determined  by  the  total  number 
of  observations  associated  with  that  parameter,  provided  that  the  global  a  is  known.  If  dlLS  or 
dMLE  is  used,  instead  of  the  true  a,  then  a  constant  multiplicative  adjustment  in  the  uncer¬ 
tainty  estimates  is  necessary  (Figures  5a  and  5b).  Conversely,  the  global  CTunbjascd  can  be 
stably  estimated  from  any  individual  uncertainty  estimate  (by  multiplying  by  the  square-root 
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3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0 


mb  by  LSMF  (no —  censoring) 

Figure  2a.  The  m^  and  m^  versus  ^or  1 explosions  (328  phases).  There  is  only 

a  narrow  magnitude  range  from  about  5.7  to  6.5  in  which  the  censoring  effects  of  clipping  or 
non-detection  do  not  lead  to  serious  bias.  Compared  to  MLE,  ILS  tends  to  underestimate 
somewhat  the  censoring  effects. 
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Figure  2b.  The  mbwLE  versus  mbiLs  for  110  explosions  (328  phases).  Compared  to  MLE,  ILS 
tends  to  underestimate  somewhat  the  censoring  effects. 
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Receiver  Effects  by  LSMF  (no— censoring) 

Figure  3a.  The  receiver  effects  of  127  WWSSN  stations  indicate  that  ELS  tends  to  underesti 
mate  the  censoring  effects  due  to  clippings  or  non-detections,  same  as  in  Figures  2a  and  2b. 
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Receiver  Effects  of  127  Stations 


Receiver  Effects  by  ILS  (censoring ) 

Figure  3b.  Comparison  of  the  receiver  effects  estimated  by  the  MLE  and  the  ELS. 
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Distribution  of  Receiver  Terms 


Figure  3c.  Histogram  of  the  receiver  effects  with  bin  width  0.1  unit.  Circles,  “+”  signs,  and 
triangles  represents  the  spread  of  the  127  receiver  terms  estimated  with  the  LSMF,  the  MLE, 
and  the  ILS  methods  respectively. 
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I LS -\-Bootstrap  vs.  ILS+'Jftof  Observations 


Uncertainty  Estimated  by  ILS 


Figure  4a.  The  uncertainty  estimated  by  the  ILS  versus  that  by  the  ILS  and  bootstrap.  The 
abscissa  ErrorILS  is  predicted  in  an  analytical  manner  by  dividing  the  djLs  by  the  square  root 
of  number  of  associated  observations.  The  ILS  plus  bootstrap  method  computes  the  ordinate 
with  statistics  from  600  Monte  Carlo  simulations.  The  highly  linear  relationship  between 
these  two  methods  strongly  indicates  that  the  ILS  alone  would  give  a  fairly  good  estimate  of 
the  uncertainty,  except  for  the  constant  magnifying  factor  (i.e. ,  the  slope),  1.450. 
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/ LS  \- Bootstrap  vs.  /LS-W//  of  Observations 


Figure  5a.  The  logjo^#  of  observations  (including  signals,  noises,  and  clips)  versus  the 
log10(uncertainty  estimate).  Filled  circles  represent  the  results  from  600  bootstrap  iterations. 
The  nearly  linear  relationship  (X  +  Y  ~  log10(0.38 1 ))  indicates  that  the  product  of  the 
V#of  observations  and  the  individual  error  estimate  will  be  a  stable  estimator  of  the  true  a  of 
the  obscuring  perturbations.  The  lower  curve  (X  +  Y  =  log10(0.263))  gives  the  predicted 
uncertainty  with  6|[  s. 
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MLE+  Bootstrap  vs.  ML^  +  V//  of  Observations 


Figure  5b.  Same  as  Figure  5a  except  that  the  MLE  is  used  instead  of  the  ELS.  The  linear 
relationship  is  as  obvious  as  in  the  ELS  case.  The  offset  between  the  theoretical  curve  and  the 
bootstrap’s  result  is  smaller  than  that  in  Fig.  5a.  The  upper  line  (X  +  Y  =  log10(0.401))  and 
the  lower  line  (X  +  Y  =  log10(0.329))  will  merge  together  in  the  non-censoring  case. 
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of  the  associated  number  of  observations).  In  this  example,  cImle+b  and  are  0.401  and 

0.381,  respectively,  suggesting  a  magnifying  factor  of  1.218  and  1.450  for  dMLE  and  $ILS, 
respectively,  under  the  assumption  that  bootstrap  gives  a  better  estimate  of  a  (Figures  4a  and 
4b).  In  the  previous  example  of  four  events,  the  correcting  factors  required  were  1.035  (for 
the  MLE)  and  1.058  (for  the  ILS)  (cf.  the  discussion  following  Tables  4  and  5). 

Figures  6a  and  6b  show  the  histogram  of  the  residuals  corresponding  to  the  ILS  and  the 
MLE  with  bin  width  0.1  unit.  Circles,  “+”  signs  and  triangles  represent  the  spread  of  7547 
“regular  residuals”,  5012  “noisy  residuals”,  and  950  “clipped  residuals”,  respectively.  The 
ILS  residuals  are  slightly  more  concentrated  than  those  of  the  MLE,  since  Oils  is  smaller  than 
dMLE.  The  shape  of  the  interpolated  histogram  of  the  regular  residuals  empirically  justifies  the 
idealized  assumption  imposed  on  (1)  that  residuals  follow  a  normal  distribution.  Note  that  the 
“censored  residuals”  are  not  realizations  or  direct  observations  of  the  Gaussian  noise.  They 
represent  the  “thresholds”  in  observing  the  Gaussian  noise.  The  noisy  residuals  are  com¬ 
pactly  clustered  on  the  positive  side,  while  the  clipped  residuals  are  clustered  on  the  negative 
side.  Such  extra  bounding  information  would  of  course  improve  precision  in  estimating  a. 
Note  that  the  ILS  and  the  MLE  are  the  standard  least  squares  methods  (9)  acting  on  the  Y, 
i.e.  the  “refined  observations”  as  defined  in  (7).  If  we  define  the  “refined  residual”  as  the 
discrepancy  between  the  final  conditional  expectation  of  the  censored  observation  (7)  and  the 
regressed  mean,  then  an  equivalent  histogram  of  Figure  6a  or  6b  is  obtained  which  shows 
again  that  the  Gaussian  assumption  is  adequate  (Figure  6c).  This  equivalent  histogram  illus¬ 
trates  quantitatively  the  constructive  effect  of  incorporating  the  censored  information.  In  sta¬ 
tistical  terms,  this  means  that  the  Central  Limit  Theorem  is  also  valid  in  the  case  of  a  doubly 
censored  model  as  considered  in  this  study. 
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-3-2-1  0  1  2  3 

ILS  Residual 


Figure  6a.  The  histogram  of  the  ILS  residuals  with  bin  width  0.1  unit.  Circles,  “+”  signs, 
and  triangles  represents  the  spread  of  7547  “regular  residuals”,  5012  “noisy  residuals”,  and 
950  “clipped  residuals”,  respectively.  The  ELS  residuals  are  slightly  more  concentrated  than 
that  of  MLE,  because  is  smaller  than  dMLE.  The  shape  of  the  interpolated  histogram 
empirically  justifies  the  idealized  assumption  that  residuals  follow  a  normal  distribution. 
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Distribution  of  MLE  Residuals 


MLE  Residual 


Figure  6b.  The  histogram  of  the  MLE  residuals  (See  Figure  6a  for  caption). 
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-3-2-1  0  1  2  3 

Refined  Residual  by  ILS  &  MLE 


Figure  6c,  The  “refined”  histogram  showing  the  effect  of  incorporating  the  censored  infor¬ 
mation.  Here  we  replaced  each  censored  residual  by  its  “refined”  form,  which  is  defined  as 
the  discrepancy  between  the  final  conditional  expectation  of  the  censored  observation  and  the 
regressed  mean.  The  interpolated  histograms  illustrate  again  that  the  Gaussian  assumption  is 
adequate. 
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6.  DISCUSSION  AND  CONCLUSIONS 

In  this  study,  we  have  briefly  reviewed  several  methods  used  in  network  magnitude  esti¬ 
mation,  and  we  have  illustrated  that  the  uncertainty  in  the  parameter  estimates  can  be  easily 
estimated  by  scaling  the  standard  deviation  of  the  perturbing  noises.  The  Iterative  Least- 
Squares  method  is  very  similar  to  the  EM  algorithm  for  the  Maximum-Likelihood  Estimator 
in  that  the  same  routine  can  be  used  except  a  minor  modification  to  the  calculation  of  o  (cf. , 
(13)  and  (16))  (Aitkin,  1981).  Even  though  neither  of  these  two  estimators  is  truly  unbiased 
for  a  in  statistical  sense,  the  MLE  has  attracted  more  attention  in  the  seismological  commun¬ 
ity,  at  least  as  reflected  by  the  literature  to  date. 

The  original  ILS  scheme  (Schmee  and  Hahn,  1979;  Aitkin,  1981)  applies  the  D.O.F.- 
adjustment  (14)  on  or  within  each  iteration  loop,  i.e.  they  look  for  the  a  satisfying  A  = 
(npath-nq)a2,  while  the  present  scheme  suggests  doing  the  D.O.F.-adjustment  in  an  off-line 
sense  for  consistency  with  the  MLE.  Schmee  and  Hahn’s  original  scheme  yields  a  result  of 
0.264  for  rijLs,  909.8  for  A,  and  -4677  for  the  log-likelihood  function  in  Example  2,  just 
slightly  different  from  ours.  For  regression  problems  with  small  sample  size  such  as  that  in 
Schmee  and  Hahn  (1979)  and  Example  1  considered  here,  the  discrepancy  between  0^  and 
0Mle  *s  insignificant.  However,  it  is  not  true  for  large  sample  cases  such  as  Example  2  of  this 
study. 

A  few  comments  on  the  bootstrap  method  are  worth  making:  the  bootstrap  method  is 
computationally  inefficient  since  it  usually  requires  hundreds  of  MLE  or  ELS  calls.  The  more 
data  (and  hence  more  residuals)  that  are  used,  the  more  Monte  Carlo  computations  are 
required.  The  illustrative  example  given  in  Tables  4  and  5  is  nearly  the  extreme  case,  since 


Teledyne  Geotech 


37 


June  1988 


Iterative  Network  m.  Estimators  With  Censored  Data  TGAL-88-06 

b 

the  total  number  of  unknowns  is  not  too  large.  If  there  are  thousands  of  residuals  as  in  the 
second  example,  then  the  bootstrap  method  would  consume  copious  computational  time. 
Secondly,  we  have  shown  that  each  of  the  ILS  and  the  EM  algorithms  will  provide  its  own 
error  estimates  in  a  “quick  and  easy”  manner.  If  the  possibly  biased  estimate  of  o  is  an 
acceptable  approximation,  then  a  single  call  of  the  ELS  or  the  MLE  will  save  much  computa¬ 
tional  time.  Thirdly,  the  mb  and  station  estimates  obtained  from  the  bootstrap  method  would 
not  be  exacdy  identical  to  the  LSMF  (or  the  MLE,  the  ELS)  even  in  the  non-censoring  case 
(cf. ,  Tables  4  and  5).  At  best,  it  regenerates  mb  estimates  asymptotic  to  the  adopted  estimator, 
while  the  “exact”  result  (Table  3)  must  have  existed  in  order  to  start  the  bootstrap  loop.  Ear¬ 
lier  work  on  the  network  (or  single  event)  mb  estimation  (e.g.  McLaughlin  et  al. ,  1986a, 
1986b;  McLaughlin,  1988)  uses  the  bootstrap  only  for  error  estimation,  while  the  mb’s  and  the 
station  terms  were  actually  obtained  from  a  separate  call  of  the  EM  method.  Efron  and 
Tibshirani  (1985)  have  shown  that,  for  the  simplest  non-censored  case,  the  bootstrap  method 
is  asymptotically  correct  yet  redundant  in  assessing  the  uncertainty  in  the  sample  mean.  The 
same  conclusion  seems  to  be  valid  even  in  the  censored  case,  as  illustrated  in  this  study. 
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